The Sketch Engine as infrastructure for historical corpora
نویسندگان
چکیده
A part of the case for corpus building is always that the corpus will have many users and uses. For that, it must be easy to use. A tool and web service that makes it easy is the Sketch Engine. It is commercial, but this can be advantageous: it means that the costs and maintenance of the service are taken care of. All parties stand to gain: the resource developers both have their resource showcased for no cost, and get to use the resource within the Sketch Engine themselves (often also at no cost). Other users benefit from the functions and features of the Sketch Engine. The tool already plays this role in relation to four historical corpora, three of which are briefly presented. A premise of historical corpus development is that a corpus, once created, will be widely used. If it is not easy to use it, this will not happen. In 2012, this means making it available to search over the web. You might do this by developing your own tool, or installing and using someone else’s, or getting someone else to handle that whole side of things for you.
منابع مشابه
A Corpus Factory for Many Languages
For many languages there are no large, general-language corpora available. Until the web, all but the richest institutions could do little but shake their heads in dismay as corpus-building was long, slow and expensive. But with the advent of the Web it can be highly automated and thereby fast and inexpensive. We have developed a `corpus factory' where we build large corpora. In this paper we d...
متن کاملLarge Scale Keyword Extraction using a Finite State Backend
We present a novel method for performing fast keyword extraction from large text corpora using a finite state backend. The FSA3 package has been adopted for this purposes. We outline the basic approach and present a comparison with previous hash-based method as used in Sketch Engine.
متن کاملTerminology finding in the Sketch Engine: an evaluation
The Sketch Engine is a leading corpus query tool, in use for lexicography at OUP, CUP, Collins and Le Robert, and at national language institutes of eight countries, and for teaching and research in many universities. Its distinctive feature is the ‘word sketch’ a one page, automatic, corpus, derived summary of a word’s grammatical and collocational behaviour. Very large corpora and word sketch...
متن کاملSketching the Dependency Relations of Words in Chinese
We proposes a language resource by automatically sketching grammatical relations of words based on dependency parses from untagged texts. The advantage of word sketch based on parsed corpora is, compared to Sketch Engine (Kilgarriff, Rychly, Smrz, & Tugwell, 2004), to provide more details about the different usage of each word such as various types of modification, which is also important in la...
متن کاملStudying Word Sketches for Russian
Without any doubt corpora are vital tools for linguistic studies and solution for applied tasks. Although corpora opportunities are very useful, there is a need of another kind of software for further improvement of linguistic research as it is impossible to process huge amount of linguistic data manually. The Sketch Engine representing itself a corpus tool which takes as input a corpus of any ...
متن کامل